Text Classification into Abstract Classes Based on Discourse Structure
نویسندگان
چکیده
The problem of classifying text with respect to belonging to a document or a meta-document is formulated and its application areas are proposed. An algorithm is proposed for document classification tasks where counts of words is insufficient do differentiate between such abstract classes of text as metalanguage and object-level. We extend the parse tree kernel method from the level of individual sentences towards the level of paragraphs, based on anaphora, rhetoric structure relations and communicative actions linking phrases in different sentences. Tree kernel learning technique is applied to these extended trees to leverage of additional discourse-related information. We evaluate our approach in the domain of action-plan documents.
منابع مشابه
Exploiting Discourse Relations for Sentiment Analysis
The overall sentiment of a text is critically affected by its discourse structure. By splitting a text into text spans with different discourse relations, we automatically train the weights of different relations in accordance with their importance, and then make use of discourse structure knowledge to improve sentiment classification. In this paper, we utilize explicit connectives to predict d...
متن کاملClassifying Discourse Relations
Classifying Discourse Relations Mridhula Raghupathy & Hena Mehta [email protected] | [email protected] Faculty Advisors: Dr. Aravind Joshi, Dr. Ani Nenkova, & Dr. Alan Lee Abstract The goal of this project was to study properties of discourse relations as they appear in the Penn Discourse Tree Bank (PDTB), a large corpus of naturally occurring text whose discourse relations and their fe...
متن کاملA New Document Embedding Method for News Classification
Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...
متن کاملThe Prosody of Discourse Structure and Content in the Production of Persian EFL Learners
The present research addressed the prosodic realization of global and local text structure and content in the spoken discourse data produced by Persian EFL learners. Two newspaper articles were analyzed using Rhetorical Structure Theory. Based on these analyses, the global structure in terms of hierarchical level, the local structure in terms of the relative importance of text segments and the ...
متن کاملGoing beyond sentences when applying tree kernels
We go beyond the level of individual sentences applying parse tree kernels to paragraphs. We build a set of extended trees for a paragraph of text from the individual parse trees for sentences and learn short texts such as search results and social profile postings to take advantage of additional discourse-related information. Extension is based on coreferences and rhetoric structure relations ...
متن کامل